Search CORE

78 research outputs found

Ariadne: Analysis for Machine Learning Program

Author: Allain Allison
Dolby Julian
Reinen Jenna
Shinnar Avraham
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2018
Field of study

Machine learning has transformed domains like vision and translation, and is now increasingly used in science, where the correctness of such code is vital. Python is popular for machine learning, in part because of its wealth of machine learning libraries, and is felt to make development faster; however, this dynamic language has less support for error detection at code creation time than tools like Eclipse. This is especially problematic for machine learning: given its statistical nature, code with subtle errors may run and produce results that look plausible but are meaningless. This can vitiate scientific results. We report on Ariadne: applying a static framework, WALA, to machine learning code that uses TensorFlow. We have created static analysis for Python, a type system for tracking tensors---Tensorflow's core data structures---and a data flow analysis to track their usage. We report on how it was built and present some early results

arXiv.org e-Print Archive

Crossref

Who you gonna call? Analyzing Web Requests in Android Applications

Author: Dolby Julian
Lhoták Ondřej
Rapoport Marianna
Suter Philippe
Wittern Erik
Publication venue
Publication date: 18/05/2017
Field of study

Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point to possible negative effects like security and privacy violations, or impacts on battery life. In this paper, we assess different ways to analyze what web requests Android applications make. We start by presenting dynamic data collected from running 20 randomly selected Android applications and observing their network activity. Next, we present a static analysis tool, Stringoid, that analyzes string concatenations in Android applications to estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000 Android applications, and compare the performance with a simpler constant extraction analysis. Finally, we present a discussion of the advantages and limitations of dynamic and static analyses when extracting URLs, as we compare the data extracted by Stringoid from the same 20 applications with the dynamically collected data

arXiv.org e-Print Archive

Crossref

Statically Checking Web API Requests in JavaScript

Author: Dolby Julian
Laredo Jim A.
Wittern Erik
Ying Annie T. T.
Zheng Yunhui
Publication venue
Publication date: 15/02/2017
Field of study

Many JavaScript applications perform HTTP requests to web APIs, relying on the request URL, HTTP method, and request data to be constructed correctly by string operations. Traditional compile-time error checking, such as calling a non-existent method in Java, are not available for checking whether such requests comply with the requirements of a web API. In this paper, we propose an approach to statically check web API requests in JavaScript. Our approach first extracts a request's URL string, HTTP method, and the corresponding request data using an inter-procedural string analysis, and then checks whether the request conforms to given web API specifications. We evaluated our approach by checking whether web API requests in JavaScript files mined from GitHub are consistent or inconsistent with publicly available API specifications. From the 6575 requests in scope, our approach determined whether the request's URL and HTTP method was consistent or inconsistent with web API specifications with a precision of 96.0%. Our approach also correctly determined whether extracted request data was consistent or inconsistent with the data requirements with a precision of 87.9% for payload data and 99.9% for query data. In a systematic analysis of the inconsistent cases, we found that many of them were due to errors in the client code. The here proposed checker can be integrated with code editors or with continuous integration tools to warn programmers about code containing potentially erroneous requests.Comment: International Conference on Software Engineering, 201

arXiv.org e-Print Archive

Crossref

Opportunities in Software Engineering Research for Web API Consumption

Author: Dolby Julian
Laredo Jim A.
Slominski Aleksander A.
Wittern Erik
Ying Annie
Young Christopher C.
Zheng Yunhui
Publication venue
Publication date: 18/05/2017
Field of study

Nowadays, invoking third party code increasingly involves calling web services via their web APIs, as opposed to the more traditional scenario of downloading a library and invoking the library's API. However, there are also new challenges for developers calling these web APIs. In this paper, we highlight a broad set of these challenges and argue for resulting opportunities for software engineering research to support developers in consuming web APIs. We outline two specific research threads in this context: (1) web API specification curation, which enables us to know the signatures of web APIs, and (2) static analysis that is capable of extracting URLs, HTTP methods etc. of web API calls. Furthermore, we present new work on how we combine (1) and (2) to provide IDE support for application developers consuming web APIs. As web APIs are used broadly, research in supporting the consumption of web APIs offers exciting opportunities.Comment: Erik Wittern and Annie Ying are both first author

arXiv.org e-Print Archive

Crossref

Static Analysis of Shape in TensorFlow Programs

Author: Antoniadis Anastasios
Dolby Julian
Grech Neville
Lagouvardos Sifis
Smaragdakis Yannis
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th European Conference on Object-Oriented Programming (ECOOP 2020)
Publication date: 01/01/2020
Field of study

Machine learning has been widely adopted in diverse science and engineering domains, aided by reusable libraries and quick development patterns. The TensorFlow library is probably the best-known representative of this trend and most users employ the Python API to its powerful back-end. TensorFlow programs are susceptible to several systematic errors, especially in the dynamic typing setting of Python. We present Pythia, a static analysis that tracks the shapes of tensors across Python library calls and warns of several possible mismatches. The key technical aspects are a close modeling of library semantics with respect to tensor shape, and an identification of violations and error-prone patterns. Pythia is powerful enough to statically detect (with 84.62% precision) 11 of the 14 shape-related TensorFlow bugs in the recent Zhang et al. empirical study - an independent slice of real-world bugs

Dagstuhl Research Online Publication Server

Static Analysis of Shape in TensorFlow Programs (Artifact)

Author: Antoniadis Anastasios
Dolby Julian
Grech Neville
Lagouvardos Sifis
Smaragdakis Yannis
Publication venue: DARTS - Dagstuhl Artifacts Series. DARTS, Volume 6, Issue 2, Special Issue of the 34th European Conference on Object-Oriented Programming (ECOOP 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit State Model Checking

Author: Artzi Shay
Dig Danny
Dolby Julian
Ernst Michael D.
Kiezun Adam
Paradkar Amit
Tip Frank
Publication venue
Publication date: 26/03/2009
Field of study

Web script crashes and malformed dynamically-generated web pages are common errors, and they seriously impact the usability of web applications. Current tools for web-page validation cannot handle the dynamically generated pages that are ubiquitous on today's Internet. We present a dynamic test generation technique for the domain of dynamic web applications. The technique utilizes both combined concrete and symbolic execution and explicit-state model checking. The technique generates tests automatically, runs the tests capturing logical constraints on inputs, and minimizes the conditions on the inputs to failing tests, so that the resulting bug reports are small and useful in finding and fixing the underlying faults. Our tool Apollo implements the technique for the PHP programming language. Apollo generates test inputs for a web application, monitors the application for crashes, and validates that the output conforms to the HTML specification. This paper presents Apollo's algorithms and implementation, and an experimental evaluation that revealed 302 faults in 6 PHP web applications

CiteSeerX

DSpace@MIT

Finding Bugs In Dynamic Web Applications

Author: Artzi Shay
Dig Danny
Dolby Julian
Ernst Michael D.
Kiezun Adam
Paradkar Amit
Tip Frank
Publication venue
Publication date: 06/02/2008
Field of study

Web script crashes and malformed dynamically-generated web pages are common errors, and they seriously impact usability of web applications. Currenttools for web-page validation cannot handle the dynamically-generatedpages that are ubiquitous on today's Internet.In this work, we apply a dynamic test generation technique, based oncombined concrete and symbolic execution, to the domain of dynamic webapplications. The technique generates tests automatically andminimizes the bug-inducing inputs to reduce duplication and to makethe bug reports small and easy to understand and fix.We implemented the technique in Apollo, an automated tool thatfound dozens of bugs in real PHP applications. Apollo generatestest inputs for the web application, monitors the application forcrashes, and validates that the output conforms to the HTMLspecification. This paper presents Apollo's algorithms andimplementation, and an experimental evaluation that revealed a totalof 214 bugs in 4 open-source PHP web applications

CiteSeerX

DSpace@MIT

LakeBench: Benchmarks for Data Discovery over Data Lakes

Author: Abdelaziz Ibrahim
Chaudhury Subhajit
Dolby Julian
Hassanzadeh Oktie
Khatiwada Aamod
Kokel Harsha
Pedapati Tejaswini
Samulowitz Horst
Srinivas Kavitha
Publication venue
Publication date: 09/07/2023
Field of study

Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes

arXiv.org e-Print Archive